Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libdrgn: improve C string reading efficiency #313

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pcc
Copy link
Contributor

@pcc pcc commented Jun 28, 2023

The current C string reading implementation is inefficient, especially for low bandwidth remote targets, as it needs to do a separate segment read (including a fresh page table lookup) for each character read. A more efficient approach is to retain the page table between character reads, only discarding it when we hit the null terminator.

Implement this approach by allowing segments to also specify a C string reading callback. The callback for page tables will preserve the page table iterator while reading characters.

@pcc pcc force-pushed the read-cstr branch 2 times, most recently from b3f4466 to ce815d7 Compare June 28, 2023 02:35
@pcc
Copy link
Contributor Author

pcc commented Jul 1, 2023

This will need some fixups if #310 lands first, as we'll need to canonicalize the address passed to drgn_memory_reader_read_cstr as well.

The current C string reading implementation is inefficient, especially for
low bandwidth remote targets, as it needs to do a separate segment read
(including a fresh page table lookup) for each character read. A more
efficient approach is to retain the page table between character reads,
only discarding it when we hit the null terminator.

Implement this approach by allowing segments to also specify a C string
reading callback. The callback for page tables will preserve the page
table iterator while reading characters.

Signed-off-by: Peter Collingbourne <[email protected]>
@osandov
Copy link
Owner

osandov commented Oct 7, 2023

I don't love the idea of adding another callback for this. I've thought about this inefficiency in the past, and the approach that I was considering was to have drgn_program_read_c_string() read in chunks larger than 1 byte (e.g., up to the next 4k boundary, or 64 bytes, or the minimum of those two; we could experiment with different heuristics). The complication is that the memory reader interface is currently all-or-nothing; either your whole memory read succeeds, or it fails and the whole buffer is potentially garbage. So I'd like to make the memory reader interface allow partial reads.

This might be less optimal in a few edge cases, but I think it's good enough, and it's a bit cleaner. There are other use cases that I'd like to allow partial reads for anyways (e.g., for batched reads of all of the struct pages in the system while handling #27).

First of all, do you think this approach would solve your problem? Secondly, would you be interested in implementing it? If so, I have some more specific thoughts about how to do it that I can share. No worries if not, I can get to it soon-ish.

@pcc
Copy link
Contributor Author

pcc commented Oct 7, 2023

I also considered whether C string reads could be optimized by using/extending the existing callback. But it seemed to me that null-terminated strings were the only special case that could justify having a separate callback, given how common they are in the types of programs that drgn is used to debug, together with the low "granularity" (per byte) of the termination check, which increases the benefit of pushing down optimizations to a lower level.

My concern with larger transfer sizes is similar to what I've seen with e.g. #312: these debug adapters can be highly bandwidth and latency constrained, creating the need to carefully optimize what we send over the wire to the adapter and thence to the target. At least in my experience the strings being transferred are typically relatively short which means that it is better to try to reduce bandwidth in this case.

That's not to say that latency isn't important; quite the contrary, it can have a big impact on long strings. But with a C string specialized approach I think there is an opportunity to further optimize latency for long strings in the debug adapter case. The idea that I had was to put more intelligence into the debug adapter so that it can issue the necessary JTAG/SWD commands to read up to the null terminator without host involvement, so you don't need a USB round trip except when crossing a page boundary. (This could be done by extending an open source debug adapter firmware such as DAPLink, together with the associated protocol standard.) This would be precluded with a short read based approach.

That all being said, I'm not fundamentally opposed to a short read based approach as long as it doesn't regress performance perceptibly in typical remote debugging scenarios. (If you knew anything about me, you would know that I am the last person to try to push for unnecessary optimizations, so I am happy to be proven wrong about my assumptions, but they are based on several months of experience working with these debug adapters.) If you would like to try implementing it, I can try running benchmarks on my setup.

@osandov
Copy link
Owner

osandov commented Oct 9, 2023

Ok, you've convinced me that the extra callback makes sense. In fact, it would also benefit the /proc/kcore and local core dump readers, since those could use my suggestion of reading extra characters but with the additional information they have about what addresses are valid to read from.

I'll come back to this one after #310 and #312, which have some bearing on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants